Skip to content

[SPARK-15149][EXAMPLE][DOC] update kmeans example#12925

Closed
zhengruifeng wants to merge 4 commits into
apache:masterfrom
zhengruifeng:km_pe
Closed

[SPARK-15149][EXAMPLE][DOC] update kmeans example#12925
zhengruifeng wants to merge 4 commits into
apache:masterfrom
zhengruifeng:km_pe

Conversation

@zhengruifeng

@zhengruifeng zhengruifeng commented May 5, 2016

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Python example for ml.kmeans already exists, but not included in user guide.
1,small changes like: example_on example_off
2,add it to user guide
3,update examples to directly read datafile

How was this patch tested?

manual tests
`./bin/spark-submit examples/src/main/python/ml/kmeans_example.py

@SparkQA

SparkQA commented May 5, 2016

Copy link
Copy Markdown

Test build #57857 has finished for PR 12925 at commit 221ea4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented May 5, 2016

Copy link
Copy Markdown

Test build #57859 has finished for PR 12925 at commit 5fa7f5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

Copy link
Copy Markdown
Member

@dongjoon-hyun This one as well. Do you mind if I ask your thoughts on the component in the title? Making good examples for PRs will help all other contributers.

@zhengruifeng

zhengruifeng commented May 5, 2016

Copy link
Copy Markdown
Contributor Author

@HyukjinKwon ok. I will change them to [EXAMPLE]

@zhengruifeng zhengruifeng changed the title [SPARK-15149][DOC] include python example for kmeans [SPARK-15149][EXAMPLE] include python example for kmeans May 5, 2016
@MLnick

MLnick commented May 5, 2016

Copy link
Copy Markdown
Contributor

@zhengruifeng I prefer the style of bisecting_k_means_example.py ie working with data = spark.read.text("data/mllib/kmeans_data.txt"). Could we harmonize this with that one?

I will comment on #11844 too about harmonzing the Scala examples.

@zhengruifeng

zhengruifeng commented May 5, 2016

Copy link
Copy Markdown
Contributor Author

@MLnick Ok. I will update this examples to read the datafile

@zhengruifeng

Copy link
Copy Markdown
Contributor Author

@MLnick updated. Thanks for your comments.

@zhengruifeng

Copy link
Copy Markdown
Contributor Author

Oh, I need to update the JavaKMeansExample and KMeansExample

@SparkQA

SparkQA commented May 5, 2016

Copy link
Copy Markdown

Test build #57885 has finished for PR 12925 at commit 059a739.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented May 5, 2016

Copy link
Copy Markdown

Test build #57888 has finished for PR 12925 at commit f9ff25a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah

sethah commented May 5, 2016

Copy link
Copy Markdown
Contributor

Ah, I had a PR ready for this but didn't see you had created a Jira for it. I can review.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you see my comments about this on #11844 and let me know?

@zhengruifeng zhengruifeng May 6, 2016

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it. I will make KMeans examples keep in line with BiKMeans ones

@zhengruifeng zhengruifeng changed the title [SPARK-15149][EXAMPLE] include python example for kmeans [SPARK-15149][EXAMPLE][DOC] update kmeans example May 7, 2016
@zhengruifeng

Copy link
Copy Markdown
Contributor Author

data/mllib/sample_kmeans_data.txt was created in BisectingKMeans examples

@SparkQA

SparkQA commented May 8, 2016

Copy link
Copy Markdown

Test build #58084 has finished for PR 12925 at commit d91cbe9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented May 9, 2016

Copy link
Copy Markdown

Test build #58145 has finished for PR 12925 at commit d5f02c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


import numpy as np
# $example on$
from pyspark.ml.clustering import KMeans, KMeansModel

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to import KMeansModel here.

@sethah

sethah commented May 9, 2016

Copy link
Copy Markdown
Contributor

LGTM other than one minor comment and pending #11844

Run with:
bin/spark-submit examples/src/main/python/ml/kmeans_example.py <input> <k>

This example requires NumPy (http://www.numpy.org/).

@holdenk holdenk May 9, 2016

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: So I believe this example still requires NumPy even though it isn't explicitly imported (see inside of def toArray called inside of clusterCenters which says it returns a NumPy array).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will revert this removal.

@SparkQA

SparkQA commented May 11, 2016

Copy link
Copy Markdown

Test build #58307 has finished for PR 12925 at commit 5020773.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng

Copy link
Copy Markdown
Contributor Author

@MLnick Thanks. Updated

@SparkQA

SparkQA commented May 11, 2016

Copy link
Copy Markdown

Test build #58309 has finished for PR 12925 at commit f2ff8d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick

MLnick commented May 11, 2016

Copy link
Copy Markdown
Contributor

LGTM. I'll merge this once #11844 is merged.

@MLnick

MLnick commented May 11, 2016

Copy link
Copy Markdown
Contributor

Merged to master and branch-2.0. Thanks!

asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?
Python example for ml.kmeans already exists, but not included in user guide.
1,small changes like: `example_on` `example_off`
2,add it to user guide
3,update examples to directly read datafile

## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/kmeans_example.py

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12925 from zhengruifeng/km_pe.

(cherry picked from commit 8beae59)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
@asfgit asfgit closed this in 8beae59 May 11, 2016
@zhengruifeng zhengruifeng deleted the km_pe branch May 11, 2016 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants